Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Sci Rep ; 14(1): 7028, 2024 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-38528062

RESUMO

Accurate indel calling plays an important role in precision medicine. A benchmarking indel set is essential for thoroughly evaluating the indel calling performance of bioinformatics pipelines. A reference sample with a set of known-positive variants was developed in the FDA-led Sequencing Quality Control Phase 2 (SEQC2) project, but the known indels in the known-positive set were limited. This project sought to provide an enriched set of known indels that would be more translationally relevant by focusing on additional cancer related regions. A thorough manual review process completed by 42 reviewers, two advisors, and a judging panel of three researchers significantly enriched the known indel set by an additional 516 indels. The extended benchmarking indel set has a large range of variant allele frequencies (VAFs), with 87% of them having a VAF below 20% in reference Sample A. The reference Sample A and the indel set can be used for comprehensive benchmarking of indel calling across a wider range of VAF values in the lower range. Indel length was also variable, but the majority were under 10 base pairs (bps). Most of the indels were within coding regions, with the remainder in the gene regulatory regions. Although high confidence can be derived from the robust study design and meticulous human review, this extensive indel set has not undergone orthogonal validation. The extended benchmarking indel set, along with the indels in the previously published known-positive set, was the truth set used to benchmark indel calling pipelines in a community challenge hosted on the precisionFDA platform. This benchmarking indel set and reference samples can be utilized for a comprehensive evaluation of indel calling pipelines. Additionally, the insights and solutions obtained during the manual review process can aid in improving the performance of these pipelines.


Assuntos
Benchmarking , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Biologia Computacional , Controle de Qualidade , Mutação INDEL , Polimorfismo de Nucleotídeo Único
2.
Genome Biol ; 24(1): 270, 2023 Nov 27.
Artigo em Inglês | MEDLINE | ID: mdl-38012772

RESUMO

BACKGROUND: Genomic DNA reference materials are widely recognized as essential for ensuring data quality in omics research. However, relying solely on reference datasets to evaluate the accuracy of variant calling results is incomplete, as they are limited to benchmark regions. Therefore, it is important to develop DNA reference materials that enable the assessment of variant detection performance across the entire genome. RESULTS: We established a DNA reference material suite from four immortalized cell lines derived from a family of parents and monozygotic twins. Comprehensive reference datasets of 4.2 million small variants and 15,000 structural variants were integrated and certified for evaluating the reliability of germline variant calls inside the benchmark regions. Importantly, the genetic built-in-truth of the Quartet family design enables estimation of the precision of variant calls outside the benchmark regions. Using the Quartet reference materials along with study samples, batch effects are objectively monitored and alleviated by training a machine learning model with the Quartet reference datasets to remove potential artifact calls. Moreover, the matched RNA and protein reference materials and datasets from the Quartet project enables cross-omics validation of variant calls from multiomics data. CONCLUSIONS: The Quartet DNA reference materials and reference datasets provide a unique resource for objectively assessing the quality of germline variant calls throughout the whole-genome regions and improving the reliability of large-scale genomic profiling.


Assuntos
Benchmarking , Genoma Humano , Humanos , Reprodutibilidade dos Testes , Polimorfismo de Nucleotídeo Único , Células Germinativas , Sequenciamento de Nucleotídeos em Larga Escala/métodos
3.
Nat Biotechnol ; 2023 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-37679543

RESUMO

Characterization and integration of the genome, epigenome, transcriptome, proteome and metabolome of different datasets is difficult owing to a lack of ground truth. Here we develop and characterize suites of publicly available multi-omics reference materials of matched DNA, RNA, protein and metabolites derived from immortalized cell lines from a family quartet of parents and monozygotic twin daughters. These references provide built-in truth defined by relationships among the family members and the information flow from DNA to RNA to protein. We demonstrate how using a ratio-based profiling approach that scales the absolute feature values of a study sample relative to those of a concurrently measured common reference sample produces reproducible and comparable data suitable for integration across batches, labs, platforms and omics types. Our study identifies reference-free 'absolute' feature quantification as the root cause of irreproducibility in multi-omics measurement and data integration and establishes the advantages of ratio-based multi-omics profiling with common reference materials.

4.
J Phys Chem Lett ; 14(1): 148-157, 2023 Jan 12.
Artigo em Inglês | MEDLINE | ID: mdl-36579474

RESUMO

Currently, computational materials science involves human-computer interaction through coding in software or neural networks. There is still no direct way for human intelligence endorsement. The digitalization of human intelligence should be the ultimate goal for many disciplines. In materials science, human intelligence is still irreplaceable from machine learning techniques, where humans can deal with complex correlations in the real world. We design the framework of Mateverse, a materials science computation platform based on Metaverse, which unifies human intelligence, experiment data, and theoretical simulations. In Mateverse, we intensively study the properties of H2O, including the liquid and solid phases. We show that we can optimize a new water force field (which we name TIP4P-Meta) directly from the interactions between human and visible properties of H2O. This force field is validated to be better than the conventional water model, and new ice polymorphs can be generated. We believe our platform can provide valuable hints in the paradigm upgrade in future computational materials science development.


Assuntos
Inteligência Artificial , Ciência dos Materiais , Humanos , Software , Redes Neurais de Computação , Água
5.
Sci Data ; 9(1): 587, 2022 09 24.
Artigo em Inglês | MEDLINE | ID: mdl-36153392

RESUMO

Molecular subtyping of triple-negative breast cancer (TNBC) is essential for understanding the mechanisms and discovering actionable targets of this highly heterogeneous type of breast cancer. We previously performed a large single-center and multiomics study consisting of genomics, transcriptomics, and clinical information from 465 patients with primary TNBC. To facilitate reusing this unique dataset, we provided a detailed description of the dataset with special attention to data quality in this study. The multiomics data were generally of high quality, but a few sequencing data had quality issues and should be noted in subsequent data reuse. Furthermore, we reconduct data analyses with updated pipelines and the updated version of the human reference genome from hg19 to hg38. The updated profiles were in good concordance with those previously published in terms of gene quantification, variant calling, and copy number alteration. Additionally, we developed a user-friendly web-based database for convenient access and interactive exploration of the dataset. Our work will facilitate reusing the dataset, maximize the values of data and further accelerate cancer research.


Assuntos
Transcriptoma , Neoplasias de Mama Triplo Negativas , Variações do Número de Cópias de DNA , Feminino , Genoma Humano , Genômica , Humanos , Neoplasias de Mama Triplo Negativas/genética
6.
J Phys Chem Lett ; 12(37): 9124-9131, 2021 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-34523944

RESUMO

The magic-angle twisted bilayer graphene (MATBG) recently attracted intensive research attention because of its fascinating and unconventional electronic properties. Herein, we claim the magic-angle phenomenon originates from the Heisenberg uncertainty principle, which can provide intensive explanations on finite size effect and twist-dependent low energy band variations. We showed that flat bands could exist only near the AA stacking structure rather than AB. The finite-size effect gives the minimal size of graphene quantum dots (R ≳ 4 nm) for the emergence of the Dirac point, and the uncertainty relation provides the upper bound for moiré supercells (R ≲ 23.5 nm) in twisted bilayer graphene, which is the quantum mechanical boundary for the emergence of flat bands. Combining the twist dependence of moiré supercell size, we proved that there is only one possible magic angle in MATBG at θ ≈ 1.1°. Our result implies that the unconventional phenomena in MATBG originate from the fundamental feature of condensed matter physics.

7.
Sci Data ; 7(1): 400, 2020 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-33208742

RESUMO

In the research field of material science, quantum chemistry database plays an indispensable role in determining the structure and properties of new material molecules and in deep learning in this field. A new quantum chemistry database, the QM-sym, has been set up in our previous work. The QM-sym is an open-access database focusing on transition states, energy, and orbital symmetry. In this work, we put forward the QM-symex with 173-kilo molecules. Each organic molecular in the QM-symex combines with the Cnh symmetry composite and contains the information of the first ten singlet and triplet transitions, including energy, wavelength, orbital symmetry, oscillator strength, and other quasi-molecular properties. QM-symex serves as a benchmark for quantum chemical machine learning models that can be effectively used to train new models of excited states in the quantum chemistry region as well as contribute to further development of the green energy revolution and materials discovery.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...